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Abstract 

Thanks to the reduced cost of RNA-Sequencing and other advanced methods for quan- 
tifying expression levels, accurate and expansive comparative expression data sets including 
data from multiple individuals per species are emerging. Comparative genomics has been 
greatly facilitated by the availability of statistical methods considering both between and 
within species variation for testing hypotheses regarding the evolution of DNA sequences. 
Similar methods are now needed to fully leverage comparative expression data. In this pa- 
per, we describe the j3 model which parameterizes the ratio of population to evolutionary 
expression variance, facilitating a wide variety of analyses, including a test for expression 
divergence or diversity for a single gene or a class of genes. The j3 model can also be used 
to test for lineage-specific shifts in expression level, amongst other applications. We use 
simulations to explore the functionality of these tests under a variety of circumstances. We 
then apply them to a mammalian phylogeny of 15 species typed in liver tissue. We iden- 
tify genes with high expression divergence between species as candidates for expression level 
adaptation, and genes with high expression diversity within species as candidates for expres- 
sion level conservation and plasticity. Using the test for lineage-specific expression shifts, we 
identify several candidate genes for expression level adaptation on the catarrhine and human 
lineages, including genes possibly related to dietary changes in humans. We compare these 
results to those reported previously using the species mean model which ignores population 
expression variance, uncovering important differences in performance. 

1 Introduction 

Comparative expression studies including variation within species are now emerging (Kalinka 
et al., 2010; Brawand et al., 2011; Perry et al., 2012), made possible by the developments of 
both RNA-Seq as a reliable method to quantify expression (Wang et al., 2009), and bioinformatic 
methods to appropriately normalize expression accounting for species differences, (Bullard et al., 
2010; Trapnell et al, 2010; Li and Durbin, 2009; Pickrell et al, 2010). 

Initially, comparative expression analyses considered few species and employed adaptations 
of standard statistical methods (particularly ANOVA) to detect genes with unusually high ex- 
pression divergence between species given the variance within species (Nuzhdin et al., 2004; 
Gilad et al., 2006; Khaitovich et al., 2006; Whitehead and Crawford, 2006). These methods 
typically assume independence between species. However, as more species are considered, the 
difference in shared evolutionary history between closely and distantly related species increases, 
and a complex covariance structure emerges. In current comparative expression datasets across 
larger phylogenies, the assumption of species independence does not hold, necessitating more 
sophisticated methods taking into account evolutionary relationships (Felsenstein, 1985). 

Subsequently, a number of models have been developed to describe the evolution of gene 
expression under neutral drift, stabilizing selection, or directional selection (Butler and King, 
2004; Bedford and Hartl, 2008). These models are used to calculate the expected species average 
expression levels and expression covariance between species under a particular evolutionary 
scenario. Likelihood ratio (LR) tests can then be formulated to distinguish neutrality, stabilizing 
selection, and directional selection, as has been successfully analyzed in a number of datasets 
(Bedford and Hartl, 2008; Kalinka et al., 2010; Perry et al., 2012). These methods have been 
augmented to allow for within species expression level variance (Lynch, 1991; Felsenstein, 2008; 
Hansen and Bartoszek, 2012; Rohlfs et al, 2014). 
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We build upon these models to create a the j3 model, describing expression evolution be- 
tween species and variance within species. The j3 model facilitates rigorous analyses investigating 
hypotheses of expression level conservation, adaptation, and plasticity. Our model allows anal- 
yses of expression variance between and within species, analogous to ANOVA style comparative 
expression methods, but accounting for phylogenetic structure. 

The model and parameterization we propose can be used for an expression analogy to classic 
genetic neutrality tests considering polymorphism and diversity, namely, the HKA (Hudson 
et al., 1987) and McDonald-Kreitman (McDonald and Kreitman, 1991) tests. In these tests, 
the amounts of relative amounts polymorphism within species and divergence between species 
at synonymous and non-synonymous sites within a gene are compared. Under neutrality, the 
ratio of synonymous to non-synonymous differences will be the same in the polymorphism and 
divergence data. However, in genes affected by selection, that ratio may be higher or lower for 
divergence than polymorphism data, depending on the directionality of selection. Analogously, 
in our model, we parameterize the ratio of within to between species expression variance across 
genes as (3 s hared- Genes with an unusually high ratio (/% > f3 s hared) show proportionally high 
expression divergence and may be subject to directional selection on expression level. Genes 
with an unusually low ratio (/3j < Shared) show proportionally high expression diversity and 
may have conserved expression levels which are plastic or environmentally responsive. This test 
can alternatively be thought of as a phylogenetic ANOVA, as it accounts for varying relationships 
between species. 

By selectively constraining parameters of the /3 model, a variety of additional tests can be 
constructed. For example, we can test for unusual species or lineage-specific expression vari- 
ance, as may be observed under recent relaxation of constraint on expression level, diversifying 
selection on expression level, increased constraint on gene expression level, or under extreme 
branch-specific demographic processes. Other tests may be constructed to test for differing 
expression diversity for groups of individuals within each species, for example, evolutionarily 
conserved age or sex-specific expression variance. All of these tests could be performed on a 
particular gene of interest or on a class of genes interest, for example a list of candidate genes 
could be queried for increased expression diversity in older individuals. In addition to these 
novel tests, the /3 model can be used for the same tests as other expression evolution models 
which discount within-species variance. In particular, the ft model can test for lineage-specific 
shifts in expression level, while taking into account within-species variance. 

Here, we explore the performance of two tests: the test for unusual expression divergence or 
diversity and the test for lineage-specific expression level shifts. We use simulations to describe 
these tests and formulate expectations under the null hypotheses We then apply the tests to a 
previously published expression dataset typed which includes 15 mammals. We identify a num- 
ber of genes with high expression level divergence between species as candidates for expression 
level adaptation, and genes with high expression level diversity within species as candidates 
for environmentally-responsive gene expression (plasticity). Using the test for lineage-specific 
expression shifts, we identify several viable candidate genes for expression adaptation on the 
catarrhine and human lineages. For comparison, we consider the species mean model described 
by Bedford and Hartl (2008) and recently used in a number of practical analyses (Bedford and 
Hartl, 2008; Kalinka et al., 2010; Perry et al., 2012). We compare our results to those ob- 
tained using the species mean method, noting important differences, especially for analyses of 
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species-specific expression shifts (Perry et al., 2012). 

2 Results 

2.1 The 0 model for gene expression evolution and population variance 

The evolution of quantitative traits by drift and stabilizing selection has been modeled using 
an Ornstein-Uhlenbeck (OU) process, which can be thought of as a random walk with a pull 
towards an optimal value (Hansen, 1997; Butler and King, 2004; Hansen et al., 2008; Bedford 
and Hartl, 2008; Kalinka et al., 2010). In an OU model of stabilizing selection on gene expression 
level, the parameter 9i can be thought of as the optimal expression level for gene i, of the degree 
of drift acting that expression level, and ctj strength of selective stabilization on that expression 
level. Over evolutionary time, the stationary variance of species mean expression levels for gene 

2 

i will be 2^-, which we refer to as the evolutionary variance. 

More recently, several OU-based models have been augmented to include within-species 
population level variance (Felsenstein, 2008; Lynch, 1991; Hansen and Bartoszek, 2012; Rohlfs 
et al., 2014). Accounting for population variance is crucial to distinguish non-phylogenetic 
variation, total drift, and stabilizing selection (Rohlfs et al., 2014). 

The model we describe builds on these OU models for quantitative trait evolution with the 
additional parameter /3 which describes the ratio of population to evolutionary variance. Within 

2 

species j the expression level of any individual k is distributed as Yjk ~ N(Yj,f3^), where Yj 
is the species mean expression level determined by the OU process. We call this the f3 model, 
which describes a linear relationship between population and evolutionary variance. 

In his classic paper, Lande (1976) showed that under an OU model of stabilizing selection, a 
linear relationship arises between the evolutionary variance and the population variance within 
species. Additionally, the Poisson nature of RNA-Seq and gene expression itself means that both 
evolutionary and population expression variance increase with expression mean. With that in 
mind, our model assumes a linear relationship between evolutionary and population expression 
variance. That assumption is reflected in the data, which shows a linear relationship between 

estimated evolutionary variance (j^f:) and estimated population variance (Pi-^-) (Figure 1). 

The slope of this linear relationship (parameterized by (3) should be consistent across genes 
which have undergone the same evolutionary and demographic processes under stabilizing se- 
lection. However, in a gene, i, which have experienced directional selection, driving increased 
divergence between species, while maintaining low variance within species, /3j would be lower as 
compared to other genes in the same individuals. Similarly, a gene with plastic expression level 
may have more variation within species than between as compared to other genes, raising the 
value of fii. 

2.2 Testing for unusual expression divergence and diversity 

Using the (3 model, we can test each gene for unusual expression divergence between species or 
diversity within species. This test amounts to a gene expression analogy to the HKA or MK 
tests comparing the amount of polymorphism within species to the divergence between species 
to identify deviations from neutrality. 
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Specifically, the likelihoods of a set of genes can compared under the null model where (3 
for a particular gene i is constrained to the value of j3 estimated over the entire set of genes 
(Pshared) (Hq : ft = (3 s hared) and under the alternative model where ft is allowed to vary 
(H a : ft ^ Pshared)- By comparing these likelihoods in a LR test, we can determine if ft for a 
particular gene varies significantly from /3 'shared across the genes. A gene where ft < /3 shared has 
high expression variance between species as compared to within, or high expression divergence. 
A gene where ft > ^shared has high expression variance within species as compared to between, 
or high expression diversity. 

2.2.1 Test expectation under the null hypothesis 

At the asymptotic limit, the LR test statistic for testing Hq : ft = /3 shared) versus Ha = ft ^ 
Pshared), LRT^^ shared ^ s Xi distributed under the null hypothesis. However, when applied to 
small phylogenies, the distribution of LRTr^r shartsd may not be near the asymptotic limit, and 
may deviate from a \\ (Boettiger et al., 2012, e.g.,) (see Supplementary Materials). To explore 
the null distribution of LRTp.^p over different parameter values and phylogeny sizes, we 
simulated data under the null hypothesis of ft = j3 s hared for four sets of parameter values (Sup- 
plementary Table 1) based on the median maximum likelihood estimates from the experimental 
data, under four tree sizes based on the mammalian phylogeny that we subsequently will analyze 
(Supplementary Figure 1 and Supplementary Materials). 

While the null distribution resembles the asymptotically expected xl f° r a phylogeny like the 
one analyzed here, we observe some minor deviations (Supplementary Figure 2). However, as 
the size of the phylogeny considered increases, the null distribution approaches a \\-> though it 
converges more slowly under some parameter values. As in previous studies examining parameter 
estimates over phylogeny size (Boettiger et al., 2012), we see that the parameter estimates 
improve with phylogeny size, though some are more easily estimable than others (Supplementary 
Figures 5-10). 

We performed further simulations based on a fish bone (also called fully pectinate or ladder) 
phylogeny for different numbers of species (Supplementary Figures 3, 11). Again, we see that as 
the phylogeny size increases, the simulated null distribution more closely matches the asymptotic 
expectation. It is important to note that the null distribution under a fish bone topology 
more quickly approaches \\ than the other topology, because there are more varying branch 
lengths between species in a fish bone phylogeny. Trait evolution methods are powered by 
multiple varying branch length differences between species, making a fish bone phylogeny the 
most informative. 

2.2.2 Parametric bootstrap approach for the null distribution 

To account for deviations from the asymptotically expected null distributions of LRTp.^p shar£i , 
we follow the suggestion of Boettiger et al. (2012) and use a parametric bootstrap. That is, for 
a particular gene, we simulate expression profiles based on the maximum likelihood parameter 
estimates under the null hypothesis. These simulated expression profiles are then tested for 
deviation from the null hypothesis to determine the parametric bootstrapped null distribution 
of LRT/3 i -£/3 3hared ,to which the experimental result can be compared. 
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We performed a parametric bootstrap analysis with 100 simulations for each of the genes 
simulated under the null hypothesis described above. For each gene, we compared the origi- 
nal test statistic {LRTp^p ghar£d ) to the distribution created by these additional simulations to 
determine the parametric bootstrapped p-value. The resulting bootstrapped p-values are ap- 
proximately uniformly distributed between 0 and 1 (Supplemental Figure 12) as expected. Note 
that generally the parametric bootstrap approach is most effective for accurate parameter esti- 
mates; in the presence of biased estimates and a dependence of the distribution of the LR test 
statistics on parameter values, the parametric bootstrap approach can be biased. It is therefore 
worthwhile to test the parametric bootstrap before interpreting results based on it. 

2.3 Expression divergence and diversity in mammals 

2.3.1 Mammalian expression data 

We applied the f3 model to analyze a comparative expression dataset over 15 mammalian species 
with about four individuals per species (Perry et al., 2012). Of the 15 species typed, five are 
anthropoids (common marmoset (mr), vervet (ve), rhesus macaque (mc), chimpanzee (ch), hu- 
man (hu)), five are lemurs (aye-aye (ay), Coquerel's sifaka (sf), black and white ruffed lemur 
(bw), mongoose lemur (mn), and crowned lemur (cr)), and the remaining five are more distantly 
related mammals (slow loris (si), northern treeshrew (ts), house mouse (ms), nine-banded ar- 
madillo (ar), and gray short-tailed opossum (op)). Liver tissue from each individual was typed 
using RNA-Seq (Perry et al., 2012). We consider a subset of 675 genes with no missing data 
across all species and individuals. 

2.3.2 Assessing expression divergence and diversity 

We applied the test for unusual expression divergence between species or unusual diversity within 
species to each gene in the mammalian dataset. The resulting empirical LRTp^p sh . values 
increase with departure from = fishared 

(Figure 2). We see much higher values of LRT^p shared 
for low Pi than high This is partially explained by error in /3j estimates, especially for 
higher values (Supplementary Figures 5, 11). Additionally, under the null hypothesis, some of 
the observed expression variance may be explained by increasing the estimated evolutionary 
variance, so power is reduced for genes with high /3j. 

We additionally estimated parametric bootstrapped p- values using 1000 simulations for each 
gene, finding that they roughly follow a uniform distribution with some excess of low p-values 
(Supplementary Figure 14). We compared those bootstrapped p- values to LRTp iZ £j} shared and 
found a clear correlation (Supplementary Figure 15). Using 1000 simulations, the minimum 
p-value is 0.001, so more simulations would be needed to more accurately assess the degree of 
departure from the null distribution in the tail of the distribution. 

2.3.3 Candidate genes for expression adaptation and plasticity 

Genes in the tail of the LRTp.^p ahar ^ d distribution with high j3 have unusually high popula- 
tion, as compared to evolutionary, variance, which may be indicative of conservation of gene 
expression level across species, but with plastic gene expression levels responding to individual 
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environmental conditions. Among the most significant high /3j genes, we see PPIB which has 
been implicated in immunosuppression (Price et al., 1991; Luban et al., 1993) and HSPA8, a 
heat shock protein (Daugaard et al., 2007) (Figure 3a). Based on their function, the expression 
levels of both of these genes are expected to vary depending on environmental inputs such as 
pathogen load and temperature. 

Conversely, genes with low j3 have unusually high evolutionary variance as compared to 
population variance, which is expected in cases of directional selection on expression level. The 
most extreme outlier with low is F10, which encodes Factor X, a key blood coagulation 
protein produced in the liver (Uprichard and Perry, 2002). F10 is highly expressed in armadillo 
as compared to the other mammals considered (Figure 3b). This observation is likely related to 
the fact that armadillo blood coagulates twice or five times faster than human blood, perhaps 
due in part to higher expression of F10 (Lewis and Doyle, 1964). This functional connection to 
increased F10 expression suggests that the shift may be adaptive. 

3 Testing for branch-specific expression level shifts 

The f3 model can be used to formulate hypotheses about branch-specific shifts in the expression 
of gene i by comparing likelihoods under H 0 : 8f = 0 r } on ~ a versus H a : 0f / Qnon-a^ w j iere go, 
is the value of 0i at all nodes in the shifted lineage(s), a, and Q^ on ~ a \ s the value of 0, at the 
remaining (non — a) nodes. The corresponding LR test statistic is asymptotically Xi distributed. 
The phylogeny used for these analyses seems sufficient to achieve that asymptotic distribution. 
(Supplementary Figure 16). We performed this test querying expression level shift on both the 
catarrhine (containing humans, chimpanzees, rhesus macaques, and vervets) and human lineages 
(Supplementary Tables 2, 3). 

3.1 Candidate genes for adaptation on catarrhine and human lineages 

In the test for expression shift in catarrhines (cat), we identify a number of interesting outliers 
(Supplementary Figure 17). The most significant shift is seen in DEXI, showing a clear increase 
in expression level in catarrhines. High expression of DEXI has recently been shown to be 
protective against auto- immune diseases type I diabetes and multiple sclerosis (Davison et al., 
2012), indicating an important functional role of this increased expression. 

Similarly, the test for expression shift on the human (hum) branch revealed interesting out- 
liers (Supplementary Table 3, notably, two genes linked to fat metabolism or obesity. In the 
extreme tail of the distribution, we detected human-specific increased expression of MGAT1, 
which aides in metabolism of fatty acids to triglycerides (Yen et al., 2002), and the expression 
of which has been associated with excess retention of lipids (Lee et al., 2012). Additionally, 
we see that TBCA, a tubulin cofactor which assists in the folding of /3-tubulin (Tian et al., 
1996), has increased expression in humans. Given that reduced expression of TBCA through 
a heterozygous deletion has been associated with childhood obesity in humans (Glessner et al., 
2010), it could be that the human-specific increase in TBCA expression assists in metabolism of 
a higher fat diet. However, in both cases, it is unclear if the increased expression in humans is 
an evolutionary shift in expression, helping to adapt to a more fat-rich diet, or if the increased 
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expression in humans is environmentally responding to a fat-rich diet. Expression level stud- 
ies can only distinguish between these alternatives if the environmental conditions have been 
controlled between study objects, which may be hard or impossible to achieve when comparing 
humans to other mammals. 

Another gene with a significant expression shift in humans is BCKDK. BCKDK inactivates 
the branched-chain ketoacid dehydrogenase (BCKD) complex, which catalyzes metabolism of 
branched-chain amino acids (BCAAs). Nonsense and frameshift mutations in BCKDK have 
recently been linked to low levels of BCAAs and a phenotype including autism and epilepsy 
(Novarino et al., 2012). The observed increased human BCKDK expression may be adaptive 
to slow the metabolism of BCAAs so they can be processed into neurotransmitters (Novarino 
et al, 2012). 

3.2 Comparing results using the (3 model and species mean model 

We compared our results for the expression shift tests to those reported in a side analysis by 
Perry et al. (2012) using the species mean model described by Bedford and Hartl (2008). The 
distributions of LRT gcat ^ d no7i~cat and LRT^ um ^ s „ on -hum from that analysis deviate substantially 

from the xi distribution expected under the null hypothesis (Supplementary Figure 19). This 
could be due to a number of possible numerical, optimization, or book-keeping errors, as these 
methods require a number of important technical considerations. In a comparison of the rank of 
expression shift test statistics as computed by Perry et al. (2012) and as computed using the f3 
model, we see a general lack of correlation with some similarity in the extreme outliers discussed 
in that paper (Supplementary Figure 20). 

As the species mean model is a reduction of the j3 model where each species has a single 
"individual" which is set to the species mean and f3 is fixed at zero, we implemented the species 
mean model to test for expression shifts on the catarrhine and human lineages. In our implemen- 
tation, we see that the empirical distribution of test statistics are approximately Xi distributed 
with some excess of high values (Supplementary Figure 21) and a much improved correlation 
to (3 model test statistics (Figure 4). While both models identify similar genes with branch- 
specific 9i shifts, we see much higher correlation between models for a shift on the catarrhine 
lineage than on the human lineage (Figure 4). Since the species mean model discards individual 
expression levels in favor of the species mean, it may identify genes where the mean expression 
appears to have shifted, even if the degree of variance may make that shift seem less extreme. 
By the same token, the f3 method may identify genes with a shift that can not be explained by 
the expected within-species variance. This difference is most pronounced when considering shift 
of a single species (such as humans) where considering variance within that single species may 
alter the perception of an expression shift. 

Figure 5 shows the three genes with the biggest difference in value of LRT Bhum , Q non-hum 

i ' i 

between the j3 and species mean models, that is, the genes that are most clearly identified by 
one model, while missed by the other. The gene TBCA, discussed above as a candidate for diet- 
associated expression adaptation, is a clear outlier under the (3 model (LRT Qhum 

°TBCAr°TBCA 

9.5), but is less easily identified using the species mean model (LRT„ hum /nnon — hum — O.O). 

TBCA^TBCA 
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4 Discussion 

We have described the (3 model for gene expression evolution which parameterizes the ratio 
between population and evolutionary variance as f3 so that, in addition to more classic tests 
for selection on gene expression level, diversity to divergence tests may be formulated. We 
have explored a test for gene-specific showing that the null distribution of the test statistic 
LRTp t -Lp shared is asymptotically Xi> though depending on the size of the dataset and the value 
of the parameters, the null distribution may not have converged to the asymptote. We show 
that in these parametric bootstrap approach can be used to more accurately assess 

the significance of LRTp.+p sh values. Since the parametric bootstrap may be sensitive to 
variance in parameter estimates, it is prudent to verify its effectiveness on a particular dataset 
with simulations before using it to interpret data. 

The test for gene-specific /3j can be thought of as a phylogenetic ANOVA, or as a gene 
expression analog to the HKA or MK tests. This enables a previously unavailable line of inquery 
into gene expression divergence, which may be indicative of expression-level adaptation, and 
gene expression diversity, which may be indicative of plastic expression levels responding to 
environmental conditions. The utility in the comparative method in this setting is that it allows 
us to distinguish between genes which have high variance in expression levels simply because 
expression of this gene has little effect on fitness, and genes in which expression varies because 
the gene mediates a plastic response to the environment. 

In applying the gene specific fii test to a mammalian dataset, we identified several candidates 
for expression level adaptation, most notably high expression of F10 in armadillos, which may 
be linked to their phenotype of rapid blood coagulation. We additionally identified several 
candidate genes for environmentally-responsive expression levels including PPIB, which helps 
regulate immunosuppression, and HSPA8, a heat shock protein. 

In addition to this novel test for unusual population or evolutionary variance, we used the 
/3 model to test for branch-specific shifts in expression level, as had been done previously with 
the species mean model. We found an increase in DEXI expression in catarrhines, which may 
have an adaptive role in auto-immune regulation. In humans, we found increased expression of 
two genes thought to be involved in lipid metabolism (MGAT1 and TBCA) and of BCKDK, the 
low expression of which has been linked to BCAA (necessary for neurotransmitters) deficiency, 
epilepsy, and autism. 

When comparing our lineage-specific expression shift results to those previously reported us- 
ing the species mean model, we observed startling differences. We've attributed these differences 
to a numerical or optimization problem in that original analysis, highlighting the importance of 
carefully addressing these issues. We performed an additional analysis using the species mean 
model to create a fair comparison. From that secondary analysis, we observe important differ- 
ences between the /3 model and species mean model, most notably when testing for a shift in a 
single species. By discarding population variance, the species mean model may mistake a mild 
expression shift attributable to expected within-species variance for an evolutionary shift. We 
see this illustrated by the identification of an expression shift in humans for TBCA using the (5 
model, but not using the species mean model. 

By changing parameter constraints, the /3 model can be used to test a variety of hypotheses. 
For example, branch-specific f3 values, which may be expected under branch-specific tightening 
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or relaxation of constraint, or under unusual branch-specific demographic processes. The f3 
model could also be used to test hypotheses of gene class-specific (rather than gene-specific) 
j3 values, which may vary based on gene class function. For example, genes involved in stress 
response may have a higher j3 value than housekeeping genes. 

Like all comparative expression methods, the /3 method applies to any heritable quantitative 
trait with environmental components, including metabolomics (Nicholson and Lindon, 2008; 
Cui et al., 2008; Sreekumar et al., 2009) and genome-wide methylation (Pokholok et al., 2005; 
Pomraning et al., 2009). 

As larger expression and other quantitative trait comparative datasets emerge, the versatile 
(3 model and framework described here will facilitate a wide variety of sophisticated analyses. 

5 Methods 

5.1 Data likelihood under (3 model 

The /3 model is similar to other OU process based evolutionary models (Butler and King, 2004; 
Bedford and Hartl, 2008), with the addition of within-species variance in terms of the evolu- 
tionary variance. As such, under the /3 model, expression levels across individuals and species 
follow a multivariate normal distribution identical to those under species means models at the 
species level as 

Varfr) 
CoviY^Yj) 

where Yi is the expression level in species i, Y p is the species mean expression at internal parental 
node p, 6i, a 2 , and ai are the parameter values on the branch leading to node i, t{ p is the length 
of the branch between i and p, Y a is the expression level at the most recent common ancestor 
of species i and j, and lij is the set of nodes in the lineage of Yi not in the lineage of Yj (Rohlfs 
et al, 2014). 

This multivariate normal distribution describing the species-level expression is augmented 
in the /3 model to include individuals within species, so for an individual k in species i, Yi k ~ 

2 

N(Yi, fiij 1 -)- In this way, the within species variance parameter described by Rohlfs et al. 

(Rohlfs et al., 2014) r 2 is re-parameterized as /3«^-. The entire multivariate normal distribution 
can be described as 

E{Y ik ) 
Var(Y lk ) 

Cov(Y ik ,Y u ) 
CoviY^Yfl) 



= E(Y p )e~ aiU P + 0i(l - e ~ a ^) 

2 

= ^-(1 - e - 2a * u p) + Var(Y p )e~ 2a ^ 

lOLi 

= Var(Y a )exp(- a k t k - a k t k ) 



= E(Y) 

2 

= Var(Y i ) + f3 l ^- 
2ai 

= Var{Yi) 
= Cov(Y l ,Y :i ) 
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where i 7^ j and k 7^ I. With the distribution of expression levels under a particular set of 
parameters defined according to this multivariate normal, the likelihood of the data under the 
model is simply the probability density. Notice that sampling and experimental variance is 
accounted for (and confounded) in the parameters governing the distribution of Y,ik\Yi. 

5.2 Maximum likelihood procedures 

For the test for individual gene departures from /5 'shared-, under the null hypothesis each gene i is 
governed by parameters Oi, of, and a>i, reflecting the evolutionary process of each gene based on 
its degree of stabilizing selection and drift. The population variance in all genes is controlled by 
the single parameter ft shared- To more computationally efficiently maximize likelihood over these 
3n + 1 parameters, we use a nested structure with Brent's method in the outer loop to maximize 
over the single parameter f3 s hared, an d the BFGS algorithm in the inner loop to optimize over 
Gi, of, and on for each gene. Under the alternative hypothesis, the likelihood of each gene i 
is maximized using the BFGS algorithm over Oi, of, aij, and fy. Then the likelihoods of each 
individual gene under the null and alternative hypotheses are compared to compute the LR. 

Note that in the likelihood maximization under the null hypothesis, likelihoods across genes 
are assumed to be independent so that for a particular value of j3 s hared the likelihood of a set of 
genes is simply the product of the likelihoods of each gene. While this assumption is currently 
typical in this sort of analysis, it leaves something to be desired since the evolution of expression 
levels of inter-related genes are not independent, and nor are the particular expression levels 
measured in an individual which may be responding to the environment of that individual. A 
more rigorous approach would take into account complex correlation structures across genes, 
as has been outlined for some evolutionary models (Lande and Arnold, 1983; Felsenstein, 1985, 
1988; Lynch, 1991). 

For the test of branch-specific expression shift for a particular gene i, under the null hypoth- 
esis the likelihood of each gene i is maximized over Oi, of, ati, and using the BFGS algorithm. 
Under the alternative hypothesis, the likelihood of each gene i is maximized over Of, #™ on_a 5 a ^ 
Oii, and ^, again, using the BFGS algorithm. 
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Figure Legends 



Figure 1. The maximum likelihood estimated per-gene evolutionary variance (^-) and 

2 

population variance (A ^4-) are plotted against each other. The linear regression line is shown. 



Figure 2. The test for a gene with varying from /5 'shared was computed for each gene. Those 
LR test statistics (LRTp^p^^^) are plotted against the log of the /3 parameter estimated for 
each gene (log (fa)) in a volcano plot. The dashed line indicates the value of /3 shared- 
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Figure 3. Each plot shows the expression profile across the 15 species for a genes in the 
extreme tails of the empirical distribution of the test statistic for a gene-specific j3 differing 
from f3 shared (LRT^i3 shaTed ). (a) shows genes with high values and (b) shows genes with 
low fa values. 



Figure 4. Each plot shows (a) LRT ecat _^_Qnon — cat cUld (t)) L RTqI IU7 - ii _^_Qnon— hum calculated using 

the (3 model (y-axes) and species mean model (x-axes) as implemented in this analysis. The 
line indicates x = y. 



Figure 5. Each plot shows the expression profile for genes identified with an expression shift 
in humans by the (3 model, but not by the species mean (SM) model (top row), and identified 
by the species mean model, but not by the j3 model (bottom row). Expression levels in 
humans are highlighted in pink. Each plot shows LRT„ hum / nnon — hum (as LRT) as computed 
under the (3 and species mean models. 
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Figure 3 
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Figure 4 
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